Overview

Dataset statistics

Number of variables11
Number of observations2930
Missing cells211
Missing cells (%)0.7%
Duplicate rows4
Duplicate rows (%)0.1%
Total size in memory251.9 KiB
Average record size in memory88.0 B

Variable types

Numeric9
Categorical2

Warnings

Dataset has 4 (0.1%) duplicate rowsDuplicates
Year Built is highly correlated with SalePriceHigh correlation
Full Bath is highly correlated with Gr Liv Area and 1 other fieldsHigh correlation
Gr Liv Area is highly correlated with Full Bath and 1 other fieldsHigh correlation
SalePrice is highly correlated with Year Built and 2 other fieldsHigh correlation
Year Built is highly correlated with Full Bath and 1 other fieldsHigh correlation
Full Bath is highly correlated with Year Built and 2 other fieldsHigh correlation
Gr Liv Area is highly correlated with Full Bath and 1 other fieldsHigh correlation
SalePrice is highly correlated with Year Built and 2 other fieldsHigh correlation
Full Bath is highly correlated with Gr Liv Area and 1 other fieldsHigh correlation
Gr Liv Area is highly correlated with Full Bath and 1 other fieldsHigh correlation
SalePrice is highly correlated with Full Bath and 1 other fieldsHigh correlation
Full Bath is highly correlated with SalePrice and 2 other fieldsHigh correlation
MS Zoning is highly correlated with NeighborhoodHigh correlation
Neighborhood is highly correlated with MS Zoning and 2 other fieldsHigh correlation
SalePrice is highly correlated with Full Bath and 2 other fieldsHigh correlation
Year Built is highly correlated with Full Bath and 5 other fieldsHigh correlation
Overall Cond is highly correlated with Year BuiltHigh correlation
MS SubClass is highly correlated with Neighborhood and 2 other fieldsHigh correlation
House Style is highly correlated with Year Built and 1 other fieldsHigh correlation
Gr Liv Area is highly correlated with Full Bath and 1 other fieldsHigh correlation
House Style has 211 (7.2%) missing values Missing
House Style has 314 (10.7%) zeros Zeros

Reproduction

Analysis started2021-09-25 21:53:15.384659
Analysis finished2021-09-25 21:53:37.025930
Duration21.64 seconds
Software versionpandas-profiling v3.0.0
Download configurationconfig.json

Variables

MS SubClass
Real number (ℝ≥0)

HIGH CORRELATION

Distinct16
Distinct (%)0.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean57.38737201
Minimum20
Maximum190
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size23.0 KiB
2021-09-25T17:53:37.206484image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum20
5-th percentile20
Q120
median50
Q370
95-th percentile160
Maximum190
Range170
Interquartile range (IQR)50

Descriptive statistics

Standard deviation42.63802455
Coefficient of variation (CV)0.7429861842
Kurtosis1.386774989
Mean57.38737201
Median Absolute Deviation (MAD)30
Skewness1.357579441
Sum168145
Variance1818.001138
MonotonicityNot monotonic
2021-09-25T17:53:37.355338image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=16)
ValueCountFrequency (%)
201079
36.8%
60575
19.6%
50287
 
9.8%
120192
 
6.6%
30139
 
4.7%
160129
 
4.4%
70128
 
4.4%
80118
 
4.0%
90109
 
3.7%
19061
 
2.1%
Other values (6)113
 
3.9%
ValueCountFrequency (%)
201079
36.8%
30139
 
4.7%
406
 
0.2%
4518
 
0.6%
50287
 
9.8%
60575
19.6%
70128
 
4.4%
7523
 
0.8%
80118
 
4.0%
8548
 
1.6%
ValueCountFrequency (%)
19061
 
2.1%
18017
 
0.6%
160129
4.4%
1501
 
< 0.1%
120192
6.6%
90109
3.7%
8548
 
1.6%
80118
4.0%
7523
 
0.8%
70128
4.4%

MS Zoning
Real number (ℝ≥0)

HIGH CORRELATION

Distinct7
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean136.1617747
Minimum9
Maximum142
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size23.0 KiB
2021-09-25T17:53:37.496634image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum9
5-th percentile58
Q1141
median141
Q3141
95-th percentile142
Maximum142
Range133
Interquartile range (IQR)0

Descriptive statistics

Standard deviation20.42541177
Coefficient of variation (CV)0.1500084132
Kurtosis13.81585022
Mean136.1617747
Median Absolute Deviation (MAD)0
Skewness-3.922467537
Sum398954
Variance417.1974458
MonotonicityNot monotonic
2021-09-25T17:53:37.637943image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
1412273
77.6%
142462
 
15.8%
58139
 
4.7%
14027
 
0.9%
3325
 
0.9%
862
 
0.1%
92
 
0.1%
ValueCountFrequency (%)
92
 
0.1%
3325
 
0.9%
58139
 
4.7%
862
 
0.1%
14027
 
0.9%
1412273
77.6%
142462
 
15.8%
ValueCountFrequency (%)
142462
 
15.8%
1412273
77.6%
14027
 
0.9%
862
 
0.1%
58139
 
4.7%
3325
 
0.9%
92
 
0.1%

Year Built
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct118
Distinct (%)4.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1971.356314
Minimum1872
Maximum2010
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size23.0 KiB
2021-09-25T17:53:37.815072image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum1872
5-th percentile1915
Q11954
median1973
Q32001
95-th percentile2007
Maximum2010
Range138
Interquartile range (IQR)47

Descriptive statistics

Standard deviation30.24536063
Coefficient of variation (CV)0.01534241193
Kurtosis-0.5017150401
Mean1971.356314
Median Absolute Deviation (MAD)25
Skewness-0.6044622214
Sum5776074
Variance914.7818396
MonotonicityNot monotonic
2021-09-25T17:53:38.015639image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
2005142
 
4.8%
2006138
 
4.7%
2007109
 
3.7%
200499
 
3.4%
200388
 
3.0%
197757
 
1.9%
192057
 
1.9%
197654
 
1.8%
199952
 
1.8%
200849
 
1.7%
Other values (108)2085
71.2%
ValueCountFrequency (%)
18721
 
< 0.1%
18751
 
< 0.1%
18791
 
< 0.1%
18805
0.2%
18821
 
< 0.1%
18852
 
0.1%
18907
0.2%
18922
 
0.1%
18931
 
< 0.1%
18953
0.1%
ValueCountFrequency (%)
20103
 
0.1%
200925
 
0.9%
200849
 
1.7%
2007109
3.7%
2006138
4.7%
2005142
4.8%
200499
3.4%
200388
3.0%
200247
 
1.6%
200135
 
1.2%

Lot Area
Real number (ℝ≥0)

Distinct1960
Distinct (%)66.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean10147.92184
Minimum1300
Maximum215245
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size23.0 KiB
2021-09-25T17:53:38.229903image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum1300
5-th percentile3188.3
Q17440.25
median9436.5
Q311555.25
95-th percentile17131
Maximum215245
Range213945
Interquartile range (IQR)4115

Descriptive statistics

Standard deviation7880.017759
Coefficient of variation (CV)0.7765154168
Kurtosis265.0236706
Mean10147.92184
Median Absolute Deviation (MAD)2040
Skewness12.82089817
Sum29733411
Variance62094679.89
MonotonicityNot monotonic
2021-09-25T17:53:38.429946image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
960044
 
1.5%
720043
 
1.5%
600034
 
1.2%
900029
 
1.0%
1080025
 
0.9%
750021
 
0.7%
840021
 
0.7%
168018
 
0.6%
624018
 
0.6%
612017
 
0.6%
Other values (1950)2660
90.8%
ValueCountFrequency (%)
13001
< 0.1%
14701
< 0.1%
14761
< 0.1%
14772
0.1%
14841
< 0.1%
14881
< 0.1%
14911
< 0.1%
14951
< 0.1%
15041
< 0.1%
15262
0.1%
ValueCountFrequency (%)
2152451
< 0.1%
1646601
< 0.1%
1590001
< 0.1%
1151491
< 0.1%
707611
< 0.1%
638871
< 0.1%
572001
< 0.1%
566001
< 0.1%
535041
< 0.1%
532271
< 0.1%

Neighborhood
Real number (ℝ≥0)

HIGH CORRELATION

Distinct28
Distinct (%)1.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean102.7133106
Minimum22
Maximum173
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size23.0 KiB
2021-09-25T17:53:38.604023image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum22
5-th percentile30
Q153
median113
Q3125
95-th percentile161.1
Maximum173
Range151
Interquartile range (IQR)72

Descriptive statistics

Standard deviation43.09099038
Coefficient of variation (CV)0.4195268377
Kurtosis-1.104310069
Mean102.7133106
Median Absolute Deviation (MAD)36
Skewness-0.2794473136
Sum300950
Variance1856.833452
MonotonicityNot monotonic
2021-09-25T17:53:38.763861image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=28)
ValueCountFrequency (%)
113443
15.1%
42267
 
9.1%
125239
 
8.2%
53194
 
6.6%
160182
 
6.2%
124166
 
5.7%
77165
 
5.6%
155151
 
5.2%
115131
 
4.5%
156125
 
4.3%
Other values (18)867
29.6%
ValueCountFrequency (%)
2228
 
1.0%
2310
 
0.3%
2530
 
1.0%
30108
3.7%
3944
 
1.5%
42267
9.1%
49103
 
3.5%
53194
6.6%
77165
5.6%
798
 
0.3%
ValueCountFrequency (%)
17324
 
0.8%
16772
 
2.5%
16251
 
1.7%
160182
6.2%
156125
4.3%
155151
5.2%
15348
 
1.6%
125239
8.2%
124166
5.7%
11871
 
2.4%

House Style
Real number (ℝ≥0)

HIGH CORRELATION
MISSING
ZEROS

Distinct6
Distinct (%)0.2%
Missing211
Missing (%)7.2%
Infinite0
Infinite (%)0.0%
Mean3.623390953
Minimum0
Maximum6
Zeros314
Zeros (%)10.7%
Negative0
Negative (%)0.0%
Memory size23.0 KiB
2021-09-25T17:53:38.926234image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q13
median3
Q36
95-th percentile6
Maximum6
Range6
Interquartile range (IQR)3

Descriptive statistics

Standard deviation1.899455049
Coefficient of variation (CV)0.5242202881
Kurtosis-0.6469645692
Mean3.623390953
Median Absolute Deviation (MAD)0
Skewness-0.2072354553
Sum9852
Variance3.607929483
MonotonicityNot monotonic
2021-09-25T17:53:39.058090image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=6)
ValueCountFrequency (%)
31481
50.5%
6873
29.8%
0314
 
10.7%
524
 
0.8%
119
 
0.6%
48
 
0.3%
(Missing)211
 
7.2%
ValueCountFrequency (%)
0314
 
10.7%
119
 
0.6%
31481
50.5%
48
 
0.3%
524
 
0.8%
6873
29.8%
ValueCountFrequency (%)
6873
29.8%
524
 
0.8%
48
 
0.3%
31481
50.5%
119
 
0.6%
0314
 
10.7%

Overall Cond
Real number (ℝ≥0)

HIGH CORRELATION

Distinct9
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5.563139932
Minimum1
Maximum9
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size23.0 KiB
2021-09-25T17:53:39.204195image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile4
Q15
median5
Q36
95-th percentile8
Maximum9
Range8
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.11153656
Coefficient of variation (CV)0.1998038111
Kurtosis1.491449722
Mean5.563139932
Median Absolute Deviation (MAD)0
Skewness0.574429477
Sum16300
Variance1.235513524
MonotonicityNot monotonic
2021-09-25T17:53:39.353492image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=9)
ValueCountFrequency (%)
51654
56.5%
6533
 
18.2%
7390
 
13.3%
8144
 
4.9%
4101
 
3.4%
350
 
1.7%
941
 
1.4%
210
 
0.3%
17
 
0.2%
ValueCountFrequency (%)
17
 
0.2%
210
 
0.3%
350
 
1.7%
4101
 
3.4%
51654
56.5%
6533
 
18.2%
7390
 
13.3%
8144
 
4.9%
941
 
1.4%
ValueCountFrequency (%)
941
 
1.4%
8144
 
4.9%
7390
 
13.3%
6533
 
18.2%
51654
56.5%
4101
 
3.4%
350
 
1.7%
210
 
0.3%
17
 
0.2%

Full Bath
Categorical

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct5
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size23.0 KiB
2
1532 
1
1318 
3
 
64
0
 
12
4
 
4

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters2930
Distinct characters5
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row1
4th row2
5th row2

Common Values

ValueCountFrequency (%)
21532
52.3%
11318
45.0%
364
 
2.2%
012
 
0.4%
44
 
0.1%

Length

2021-09-25T17:53:39.708477image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-09-25T17:53:39.816557image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
21532
52.3%
11318
45.0%
364
 
2.2%
012
 
0.4%
44
 
0.1%

Most occurring characters

ValueCountFrequency (%)
21532
52.3%
11318
45.0%
364
 
2.2%
012
 
0.4%
44
 
0.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number2930
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
21532
52.3%
11318
45.0%
364
 
2.2%
012
 
0.4%
44
 
0.1%

Most occurring scripts

ValueCountFrequency (%)
Common2930
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
21532
52.3%
11318
45.0%
364
 
2.2%
012
 
0.4%
44
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII2930
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
21532
52.3%
11318
45.0%
364
 
2.2%
012
 
0.4%
44
 
0.1%

Gr Liv Area
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct1292
Distinct (%)44.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1499.690444
Minimum334
Maximum5642
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size23.0 KiB
2021-09-25T17:53:39.964353image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum334
5-th percentile861
Q11126
median1442
Q31742.75
95-th percentile2463.1
Maximum5642
Range5308
Interquartile range (IQR)616.75

Descriptive statistics

Standard deviation505.5088875
Coefficient of variation (CV)0.3370754875
Kurtosis4.137838193
Mean1499.690444
Median Absolute Deviation (MAD)311
Skewness1.274109716
Sum4394093
Variance255539.2353
MonotonicityNot monotonic
2021-09-25T17:53:40.161009image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
86441
 
1.4%
109226
 
0.9%
104025
 
0.9%
145620
 
0.7%
120018
 
0.6%
89415
 
0.5%
91214
 
0.5%
81614
 
0.5%
84813
 
0.4%
172813
 
0.4%
Other values (1282)2731
93.2%
ValueCountFrequency (%)
3341
< 0.1%
4071
< 0.1%
4381
< 0.1%
4801
< 0.1%
4921
< 0.1%
4981
< 0.1%
5201
< 0.1%
5401
< 0.1%
5721
< 0.1%
5991
< 0.1%
ValueCountFrequency (%)
56421
< 0.1%
50951
< 0.1%
46761
< 0.1%
44761
< 0.1%
43161
< 0.1%
38201
< 0.1%
36721
< 0.1%
36271
< 0.1%
36081
< 0.1%
35001
< 0.1%

Yr Sold
Categorical

Distinct5
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size23.0 KiB
2007
694 
2009
648 
2006
625 
2008
622 
2010
341 

Length

Max length4
Median length4
Mean length4
Min length4

Characters and Unicode

Total characters11720
Distinct characters7
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2010
2nd row2010
3rd row2010
4th row2010
5th row2010

Common Values

ValueCountFrequency (%)
2007694
23.7%
2009648
22.1%
2006625
21.3%
2008622
21.2%
2010341
11.6%

Length

2021-09-25T17:53:40.517631image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-09-25T17:53:40.840555image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
ValueCountFrequency (%)
2007694
23.7%
2009648
22.1%
2006625
21.3%
2008622
21.2%
2010341
11.6%

Most occurring characters

ValueCountFrequency (%)
05860
50.0%
22930
25.0%
7694
 
5.9%
9648
 
5.5%
6625
 
5.3%
8622
 
5.3%
1341
 
2.9%

Most occurring categories

ValueCountFrequency (%)
Decimal Number11720
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
05860
50.0%
22930
25.0%
7694
 
5.9%
9648
 
5.5%
6625
 
5.3%
8622
 
5.3%
1341
 
2.9%

Most occurring scripts

ValueCountFrequency (%)
Common11720
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
05860
50.0%
22930
25.0%
7694
 
5.9%
9648
 
5.5%
6625
 
5.3%
8622
 
5.3%
1341
 
2.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII11720
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
05860
50.0%
22930
25.0%
7694
 
5.9%
9648
 
5.5%
6625
 
5.3%
8622
 
5.3%
1341
 
2.9%

SalePrice
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct1032
Distinct (%)35.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean180796.0601
Minimum12789
Maximum755000
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size23.0 KiB
2021-09-25T17:53:41.002410image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Quantile statistics

Minimum12789
5-th percentile87500
Q1129500
median160000
Q3213500
95-th percentile335000
Maximum755000
Range742211
Interquartile range (IQR)84000

Descriptive statistics

Standard deviation79886.69236
Coefficient of variation (CV)0.4418608034
Kurtosis5.118899951
Mean180796.0601
Median Absolute Deviation (MAD)37000
Skewness1.743500076
Sum529732456
Variance6381883616
MonotonicityNot monotonic
2021-09-25T17:53:41.203013image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
13500034
 
1.2%
14000033
 
1.1%
13000029
 
1.0%
15500028
 
1.0%
14500026
 
0.9%
16000023
 
0.8%
11000021
 
0.7%
18500021
 
0.7%
11500020
 
0.7%
17000020
 
0.7%
Other values (1022)2675
91.3%
ValueCountFrequency (%)
127891
< 0.1%
131001
< 0.1%
349001
< 0.1%
350001
< 0.1%
353111
< 0.1%
379001
< 0.1%
393001
< 0.1%
400001
< 0.1%
440001
< 0.1%
450001
< 0.1%
ValueCountFrequency (%)
7550001
< 0.1%
7450001
< 0.1%
6250001
< 0.1%
6150001
< 0.1%
6116571
< 0.1%
6100001
< 0.1%
5915871
< 0.1%
5845001
< 0.1%
5829331
< 0.1%
5565811
< 0.1%

Interactions

2021-09-25T17:53:20.560767image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-25T17:53:20.770836image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-25T17:53:20.953436image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-25T17:53:21.129489image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-25T17:53:21.303183image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-25T17:53:21.494709image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-25T17:53:21.676963image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-25T17:53:21.873977image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-25T17:53:22.043604image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-25T17:53:22.232168image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-25T17:53:22.410063image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-25T17:53:22.586809image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-25T17:53:22.769522image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-25T17:53:22.938801image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-25T17:53:23.125304image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-25T17:53:23.304597image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-25T17:53:23.490105image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-25T17:53:23.661597image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-25T17:53:23.849928image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-25T17:53:24.030427image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-25T17:53:24.212156image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-25T17:53:24.394096image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-25T17:53:24.938695image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-25T17:53:25.127744image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-25T17:53:25.334646image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-25T17:53:25.519387image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-25T17:53:25.694256image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-25T17:53:25.890175image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-25T17:53:26.056692image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-25T17:53:26.223457image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-25T17:53:26.389819image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-25T17:53:26.550701image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-25T17:53:26.725255image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-25T17:53:26.881953image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-25T17:53:27.051093image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-25T17:53:27.204961image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-25T17:53:27.396872image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-25T17:53:27.593187image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-25T17:53:27.829485image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-25T17:53:28.021679image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-25T17:53:28.205885image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-25T17:53:28.407413image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-25T17:53:28.596138image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-25T17:53:28.795158image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-25T17:53:28.991000image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-25T17:53:29.198524image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-25T17:53:29.405714image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-25T17:53:29.592145image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-25T17:53:29.791239image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-25T17:53:29.967643image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-25T17:53:30.164538image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-25T17:53:30.357375image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-25T17:53:30.550227image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-25T17:53:30.727318image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-25T17:53:30.918941image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-25T17:53:31.119667image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-25T17:53:31.321432image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-25T17:53:31.525335image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-25T17:53:31.712289image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-25T17:53:31.924340image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-25T17:53:32.120968image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-25T17:53:32.332611image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-25T17:53:32.522852image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-25T17:53:32.731473image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-25T17:53:33.070657image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-25T17:53:33.247381image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-25T17:53:33.426949image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-25T17:53:33.593696image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-25T17:53:33.776260image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-25T17:53:33.943742image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-25T17:53:34.123155image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-25T17:53:34.285110image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-25T17:53:34.456482image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-25T17:53:34.644225image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-25T17:53:34.836323image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-25T17:53:35.025625image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-25T17:53:35.205473image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-25T17:53:35.409063image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-25T17:53:35.589585image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-25T17:53:35.782659image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
2021-09-25T17:53:35.958745image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Correlations

2021-09-25T17:53:41.406417image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2021-09-25T17:53:41.688916image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2021-09-25T17:53:41.973500image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2021-09-25T17:53:42.257287image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.
2021-09-25T17:53:42.507654image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

2021-09-25T17:53:36.353558image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
A simple visualization of nullity by column.
2021-09-25T17:53:36.699586image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2021-09-25T17:53:36.853783image/svg+xmlMatplotlib v3.3.4, https://matplotlib.org/
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

MS SubClassMS ZoningYear BuiltLot AreaNeighborhoodHouse StyleOverall CondFull BathGr Liv AreaYr SoldSalePrice
020141.0196031770113.03.05116562010215000
120140.0196111622113.03.0618962010105000
220141.0195814267113.03.06113292010172000
320141.0196811160113.03.05221102010244000
460141.019971383077.06.05216292010189900
560141.01998997877.06.06216042010195500
6120141.020014920162.03.05213382010213500
7120141.019925005162.03.05212802010191500
8120141.019955389162.03.05216162010236500
960141.01999750077.06.05218042010189000

Last rows

MS SubClassMS ZoningYear BuiltLot AreaNeighborhoodHouse StyleOverall CondFull BathGr Liv AreaYr SoldSalePrice
2920160142.019701894100.06.0511092200671000
292190141.0197612640106.03.05217282006150900
292290141.019769297106.03.05217282006188000
292320141.0197717400106.03.05211262006160000
292420141.0196020000106.03.07112242006131000
292580141.019847937106.0NaN6110032006142500
292620141.019838885106.03.0519022006131000
292785141.0199210441106.0NaN519702006132000
292820141.0197410010106.03.05113892006170000
292960141.019939627106.06.05220002006188000

Duplicate rows

Most frequently occurring

MS SubClassMS ZoningYear BuiltLot AreaNeighborhoodHouse StyleOverall CondFull BathGr Liv AreaYr SoldSalePrice# duplicates
090141.019797018156.03.052153520091188582
190141.019871080053.03.053120020091790002
2120141.020064590124.03.052155420072095002
3160142.02004252253.06.052170920061300002